<<<<<<< HEAD Assignment 2: Spatial Analysis and Visualization

Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Yanyang Chen

Published

October 15, 2025

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences


Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages
library(sf)
library(tidyverse)
library(tidycensus)
library(tigris)

# Load spatial data
counties <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/Pennsylvania_County_Boundaries.shp")
Reading layer `Pennsylvania_County_Boundaries' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
counties <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/Pennsylvania_County_Boundaries.shp")
Reading layer `Pennsylvania_County_Boundaries' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
census_api_key("940dffa67b928a0518accaf8839aa7b4762b11ab")
census_tracts <- tracts(state = "PA", cb = TRUE)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |======================================================================| 100%
hospitals <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/hospitals.geojson")
Reading layer `hospitals' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84
# 3. Pennsylvania census tracts
# Check that all data loaded correctly
head(counties)
Simple feature collection with 6 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8905670 ymin: 4862594 xmax: -8317930 ymax: 5161279
Projected CRS: WGS 84 / Pseudo-Mercator
  OBJECTID MSLINK COUNTY_NAM COUNTY_NUM FIPS_COUNT COUNTY_ARE COUNTY_PER
1      336     46 MONTGOMERY         46        091       <NA>       <NA>
2      337      8   BRADFORD         08        015       <NA>       <NA>
3      338      9      BUCKS         09        017       <NA>       <NA>
4      339     58      TIOGA         58        117       <NA>       <NA>
5      340     59      UNION         59        119       <NA>       <NA>
6      341     60    VENANGO         60        121       <NA>       <NA>
  NUMERIC_LA COUNTY_N_1 AREA_SQ_MI SOUND SPREAD_SHE IMAGE_NAME NOTE_FILE VIDEO
1          5         46   487.4271  <NA>       <NA>   poll.bmp      <NA>  <NA>
2          2          8  1161.3379  <NA>       <NA>   poll.bmp      <NA>  <NA>
3          5          9   622.0836  <NA>       <NA>   poll.bmp      <NA>  <NA>
4          2         58  1137.2480  <NA>       <NA>   poll.bmp      <NA>  <NA>
5          2         59   319.1893  <NA>       <NA>   poll.bmp      <NA>  <NA>
6          3         60   683.3676  <NA>       <NA>   poll.bmp      <NA>  <NA>
  DISTRICT_N PA_CTY_COD MAINT_CTY_ DISTRICT_O                       geometry
1         06         46          4        6-4 MULTIPOLYGON (((-8398884 48...
2         03         08          9        3-9 MULTIPOLYGON (((-8558633 51...
3         06         09          1        6-1 MULTIPOLYGON (((-8367360 49...
4         03         59          7        3-7 MULTIPOLYGON (((-8558633 51...
5         03         60          8        3-8 MULTIPOLYGON (((-8562865 49...
6         01         61          5        1-5 MULTIPOLYGON (((-8870781 50...
head(hospitals)
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.27907 ymin: 39.80913 xmax: -75.17005 ymax: 40.24273
Geodetic CRS:  WGS 84
                 CHIEF_EXEC              CHIEF_EX_1
1             Peter J Adamo               President
2          Autumn DeShields Chief Executive Officer
3              Shawn Parekh Chief Executive Officer
4               DIANE HRITZ Chief Executive Officer
5            Tim Harclerode Chief Executive Officer
6 Richard McLaughlin MD MBA Chief Executive Officer
                      FACILITY_U LONGITUDE       COUNTY
1   https://www.phhealthcare.org -79.91131   Washington
2      https://www.malvernbh.com -75.17005 Philadelphia
3 https://roxboroughmemorial.com -75.20963 Philadelphia
4     https://www.ashospital.net -80.27907   Washington
5      https://www.conemaugh.org -79.02513     Somerset
6        https://towerhealth.org -75.61213   Montgomery
                               FACILITY_N                         STREET
1               Penn Highlands Mon Valley         1163 Country Club Road
2               MALVERN BEHAVIORAL HEALTH 1930 South Broad Street Unit 4
3            Roxborough Memorial Hospital              5800 Ridge Avenue
4              ADVANCED SURGICAL HOSPITAL       100 TRICH DRIVE\nSUITE 1
5 DLP Conemaugh Meyersdale Medical Center             200 Hospital Drive
6                 Pottstown Hospital, LLC          1600 East High Street
    CITY_OR_BO LATITUDE   TELEPHONE_ ZIP_CODE                   geometry
1  Monongahela 40.18193 724-258-1000    15063 POINT (-79.91131 40.18193)
2 Philadelphia 39.92619 610-480-8919    19145  POINT (-75.17005 39.9262)
3 Philadelphia 40.02869 215-483-9900    19128 POINT (-75.20963 40.02869)
4   WASHINGTON 40.15655   7248840710    15301 POINT (-80.27907 40.15655)
5   Meyersdale 39.80913 814-634-5911    15552 POINT (-79.02513 39.80913)
6    Pottstown 40.24273   6103277000    19464 POINT (-75.61213 40.24273)
head(census_tracts)
Simple feature collection with 6 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS:  NAD83
  STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID   NAME
1      42      001  031101 1400000US42001031101 42001031101 311.01
2      42      013  100400 1400000US42013100400 42013100400   1004
3      42      013  100500 1400000US42013100500 42013100500   1005
4      42      013  100800 1400000US42013100800 42013100800   1008
5      42      013  101900 1400000US42013101900 42013101900   1019
6      42      011  011200 1400000US42011011200 42011011200    112
             NAMELSAD STUSPS   NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1 Census Tract 311.01     PA Adams County Pennsylvania   CT 3043185      0
2   Census Tract 1004     PA Blair County Pennsylvania   CT  993724      0
3   Census Tract 1005     PA Blair County Pennsylvania   CT 1130204      0
4   Census Tract 1008     PA Blair County Pennsylvania   CT  996553      0
5   Census Tract 1019     PA Blair County Pennsylvania   CT  573726      0
6    Census Tract 112     PA Berks County Pennsylvania   CT 1539365   9308
                        geometry
1 MULTIPOLYGON (((-77.03108 3...
2 MULTIPOLYGON (((-78.42478 4...
3 MULTIPOLYGON (((-78.41661 4...
4 MULTIPOLYGON (((-78.41067 4...
5 MULTIPOLYGON (((-78.40836 4...
6 MULTIPOLYGON (((-75.95433 4...

Questions to answer: - How many hospitals are in your dataset? There are 223 hospital in my dataset. - How many census tracts? 3345 - What coordinate reference system is each dataset in? NAD 83


Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
pa_demo <- get_acs(
  geography = "tract",
  state = "PA",
  variables = c(
    total_pop = "B01003_001",       # 总人口
    median_income = "B19013_001",   # 家庭收入中位数
    age_65_66 = "B01001_020",       # 男 65-66岁
    age_67_69 = "B01001_021",       # 男 67-69岁
    age_70_74 = "B01001_022",       # 男 70-74岁
    age_75_79 = "B01001_023",       # 男 75-79岁
    age_80_84 = "B01001_024",       # 男 80-84岁
    age_85_plus = "B01001_025",     # 男 85岁+
    age_65_66_f = "B01001_044",     # 女 65-66岁
    age_67_69_f = "B01001_045",     # 女 67-69岁
    age_70_74_f = "B01001_046",     # 女 70-74岁
    age_75_79_f = "B01001_047",     # 女 75-79岁
    age_80_84_f = "B01001_048",     # 女 80-84岁
    age_85_plus_f = "B01001_049"    # 女 85岁+
  ),
  geometry = TRUE,
  year = 2021
)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
# Step 2: Summarize population 65+ ----
pa_demo_clean <- pa_demo |>
  select(GEOID, variable, estimate) |>
  pivot_wider(names_from = variable, values_from = estimate) |>
  mutate(
    pop_65_over = age_65_66 + age_67_69 + age_70_74 + age_75_79 +
                  age_80_84 + age_85_plus + age_65_66_f + age_67_69_f +
                  age_70_74_f + age_75_79_f + age_80_84_f + age_85_plus_f
  ) |>
  select(GEOID, total_pop, median_income, pop_65_over)

# 查看结果
head(pa_demo_clean)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.08415 ymin: 39.92397 xmax: -75.09795 ymax: 40.48008
Geodetic CRS:  NAD83
# A tibble: 6 × 5
  GEOID       total_pop median_income pop_65_over                       geometry
  <chr>           <dbl>         <dbl>       <dbl>             <MULTIPOLYGON [°]>
1 42017104204      5256        112691         487 (((-75.18818 40.36427, -75.16…
2 42101006600      3536         27049         452 (((-75.23699 39.92782, -75.23…
3 42101018802      4200         31523          48 (((-75.10644 40.00032, -75.10…
4 42017102600      2166         63393         626 (((-75.33392 40.33027, -75.33…
5 42003462600      3637         44315         535 (((-80.08415 40.47571, -80.08…
6 42101011100      3829         43235         422 (((-75.23017 39.97879, -75.22…
# Join to tract boundaries
pa_tracts <- tracts(state = "PA", cb = TRUE)

# ✅ 正确写法:去掉右侧的几何列再 join
pa_joined <- pa_tracts %>%
  left_join(st_drop_geometry(pa_demo_clean), by = "GEOID")

Questions to answer: - What year of ACS data are you using? 2021 - How many tracts have missing income data? 66
- What is the median income across all PA census tracts? 65195.5


Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
pa_demo_clean <- pa_demo_clean %>%
  mutate(
    pct_elderly = (pop_65_over / total_pop) * 100
  )

income_threshold <- median(pa_demo_clean$median_income, na.rm = TRUE) * 0.8  # 比全州中位数低 20%
elderly_threshold <- 20

vulnerable_tracts <- pa_demo_clean %>%
  filter(median_income < income_threshold & pct_elderly > elderly_threshold)


vulnerable_tracts <- pa_demo_clean %>%
  filter(median_income < income_threshold & pct_elderly > elderly_threshold)

nrow(vulnerable_tracts)
[1] 278
head(vulnerable_tracts)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -79.89962 ymin: 40.04802 xmax: -75.03866 ymax: 40.86881
Geodetic CRS:  NAD83
# A tibble: 6 × 6
  GEOID       total_pop median_income pop_65_over                       geometry
  <chr>           <dbl>         <dbl>       <dbl>             <MULTIPOLYGON [°]>
1 42101033300      3790         49505        1210 (((-75.0515 40.05046, -75.050…
2 42087960900      2327         33750         505 (((-77.57825 40.59525, -77.57…
3 42011001900      1833         19336         434 (((-75.91819 40.33353, -75.91…
4 42077001900      4414         34297        1301 (((-75.50088 40.61841, -75.49…
5 42003515300      1553         25833         397 (((-79.89887 40.41885, -79.89…
6 42097082100      2463         29342         549 (((-76.8015 40.8547, -76.7978…
# ℹ 1 more variable: pct_elderly <dbl>

Questions to answer: - What income threshold did you choose and why? I chose an income threshold equal to 80% of the statewide median household income (approximately $54,000) to identify tracts that fall substantially below the state’s overall income level. - What elderly population threshold did you choose and why? I defined tracts with more than 20% of residents aged 65 and over as having a significant elderly population, since this represents roughly the top quartile of tracts in Pennsylvania in terms of elderly share. - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition? These tracts account for approximately 6.7% of all Pennsylvania census tracts, according to my criteria.


Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

library(sf)
library(tidyverse)

vul_proj <- st_transform(vulnerable_tracts, 26918)
hosp_proj <- st_transform(hospitals, 26918)

tract_centroids <- st_centroid(vul_proj)
dist_matrix <- st_distance(tract_centroids, hosp_proj)

nearest_dist_m <- apply(dist_matrix, 1, min)

vul_proj <- vul_proj %>%
  mutate(
    dist_to_hosp_m = as.numeric(nearest_dist_m),
    dist_to_hosp_mi = dist_to_hosp_m / 1609.34
  )

avg_dist <- mean(vul_proj$dist_to_hosp_mi, na.rm = TRUE)
max_dist <- max(vul_proj$dist_to_hosp_mi, na.rm = TRUE)
over15 <- sum(vul_proj$dist_to_hosp_mi > 15)

avg_dist
[1] 5.151893
max_dist
[1] 27.78481
over15
[1] 16

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? ≈ 5.8 miles - What is the maximum distance? ≈ 27.4 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital 12 tracts


Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable
vul_proj <- vul_proj %>%
  mutate(
    underserved = dist_to_hosp_mi > 15
  )

table(vul_proj$underserved)

FALSE  TRUE 
  262    16 

Questions to answer: - How many tracts are underserved? 16

  • What percentage of vulnerable tracts are underserved? 5.76%

  • Does this surprise you? Why or why not? In my opinion I feel confused about the distance we’ve defined as a undersevred distance.It would be good if we use 10 miles to the defenition of “undersevrved”.But only about 5.8% of vulnerable census tracts are located more than 15 miles from the nearest hospital, indicating that approximately 94%–95% of vulnerable areas are relatively close to hospital facilities.


Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

vul_proj <- st_transform(vul_proj, st_crs(counties))
counties <- st_transform(counties, st_crs(vul_proj))

tracts_with_county <- st_join(vul_proj, counties, join = st_intersects)


# Aggregate statistics by county

county_summary <- tracts_with_county %>%
  st_drop_geometry() %>%  # 移除几何信息以便聚合
  group_by(COUNTY_NAME = COUNTY_NAM) %>%  # 如果你的字段叫 COUNTYNAME 或 NAME,请根据实际修改
  summarise(
    total_tracts = n(),
    vulnerable_tracts = sum(!is.na(median_income)),  # 或根据你定义的vulnerable集调整
    underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = (underserved_tracts / total_tracts) * 100,
    avg_dist_miles = mean(dist_to_hosp_mi, na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

head(county_summary)
# A tibble: 6 × 6
  COUNTY_NAME total_tracts vulnerable_tracts underserved_tracts pct_underserved
  <chr>              <int>             <int>              <int>           <dbl>
1 PERRY                  2                 2                  2           100  
2 CLINTON                3                 3                  2            66.7
3 SULLIVAN               3                 3                  2            66.7
4 BRADFORD               4                 4                  2            50  
5 CAMERON                6                 6                  3            50  
6 COLUMBIA               2                 2                  1            50  
# ℹ 1 more variable: avg_dist_miles <dbl>

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? 1.PERRY 2.CLINTON 3.SULLIVAN 4.BRADFORD 5.ACMERON

  • Which counties have the most vulnerable people living far from hospitals?
  • Are there any patterns in where underserved counties are located?

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

library(dplyr)
library(knitr)
library(scales)

county_summary <- tracts_with_county %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAME = COUNTY_NAM) %>%
  summarise(
    num_vulnerable_tracts = n(),
    num_underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = (num_underserved_tracts / num_vulnerable_tracts) * 100,
    avg_dist_miles = mean(dist_to_hosp_mi, na.rm = TRUE),
    total_vulnerable_pop = sum(total_pop, na.rm = TRUE),
    underserved_pop = sum(if_else(underserved, total_pop, 0), na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

county_summary %>%
  mutate(
    `# Vulnerable Tracts` = num_vulnerable_tracts,
    `# Underserved Tracts` = num_underserved_tracts,
    `% Underserved` = percent(pct_underserved / 100, accuracy = 0.1),
    `Avg. Distance (mi)` = round(avg_dist_miles, 2),
    `Total Vulnerable Pop.` = comma(total_vulnerable_pop),
    `Underserved Pop.` = comma(underserved_pop)
  ) %>%
  select(
    County = COUNTY_NAME,
    `# Vulnerable Tracts`,
    `# Underserved Tracts`,
    `% Underserved`,
    `Avg. Distance (mi)`,
    `Total Vulnerable Pop.`,
    `Underserved Pop.`
  ) %>%
  kable(
    caption = "County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania"
  )
County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania
County # Vulnerable Tracts # Underserved Tracts % Underserved Avg. Distance (mi) Total Vulnerable Pop. Underserved Pop.
PERRY 2 2 100.0% 17.53 5,815 5,815
CLINTON 3 2 66.7% 13.84 7,750 4,615
SULLIVAN 3 2 66.7% 18.28 6,949 3,031
BRADFORD 4 2 50.0% 14.14 14,748 7,562
CAMERON 6 3 50.0% 14.09 13,466 6,763
COLUMBIA 2 1 50.0% 9.45 5,897 970
DAUPHIN 2 1 50.0% 10.01 5,838 4,028
JUNIATA 2 1 50.0% 12.56 5,461 1,787
ELK 8 3 37.5% 12.71 19,260 8,045
CLEARFIELD 11 4 36.4% 13.47 36,592 10,359
CENTRE 3 1 33.3% 15.08 11,735 2,167
FOREST 3 1 33.3% 14.09 6,031 2,603
POTTER 6 2 33.3% 10.29 18,675 4,615
BEDFORD 4 1 25.0% 11.45 17,181 4,021
CLARION 8 2 25.0% 12.40 22,104 7,149
FRANKLIN 4 1 25.0% 5.61 14,268 1,787
HUNTINGDON 4 1 25.0% 11.41 12,266 1,787
MIFFLIN 4 1 25.0% 10.28 10,268 1,787
MONROE 4 1 25.0% 12.07 8,388 1,134
CRAWFORD 6 1 16.7% 8.37 16,001 2,661
JEFFERSON 6 1 16.7% 9.40 16,311 4,546
LYCOMING 6 1 16.7% 8.27 21,547 970
TIOGA 6 1 16.7% 10.30 21,484 1,953
VENANGO 6 1 16.7% 10.49 14,597 2,603
MCKEAN 7 1 14.3% 10.04 16,897 2,662
ARMSTRONG 8 1 12.5% 10.74 23,310 4,546
SOMERSET 8 1 12.5% 9.24 24,350 4,021
INDIANA 9 1 11.1% 7.36 31,370 4,546
WARREN 9 1 11.1% 7.01 27,976 2,603
NORTHUMBERLAND 10 1 10.0% 11.66 33,370 4,028
LUZERNE 14 1 7.1% 5.30 35,497 970
ALLEGHENY 47 0 0.0% 2.45 109,061 0
BEAVER 8 0 0.0% 4.34 17,930 0
BERKS 2 0 0.0% 2.95 4,073 0
BLAIR 5 0 0.0% 2.97 14,939 0
BUTLER 3 0 0.0% 8.60 9,545 0
CAMBRIA 14 0 0.0% 6.35 34,583 0
CARBON 3 0 0.0% 8.14 7,325 0
CUMBERLAND 2 0 0.0% 1.64 6,734 0
DELAWARE 7 0 0.0% 1.27 26,294 0
ERIE 7 0 0.0% 2.36 20,784 0
FAYETTE 16 0 0.0% 4.05 50,911 0
FULTON 1 0 0.0% 10.64 3,354 0
LACKAWANNA 10 0 0.0% 4.01 29,208 0
LANCASTER 2 0 0.0% 6.60 8,186 0
LAWRENCE 5 0 0.0% 5.80 12,399 0
LEBANON 2 0 0.0% 3.61 8,456 0
LEHIGH 4 0 0.0% 2.09 11,383 0
MERCER 5 0 0.0% 4.02 16,963 0
MONTGOMERY 3 0 0.0% 1.12 7,081 0
MONTOUR 1 0 0.0% 0.56 4,282 0
NORTHAMPTON 4 0 0.0% 2.41 11,585 0
PHILADELPHIA 21 0 0.0% 1.00 73,982 0
PIKE 2 0 0.0% 14.14 3,903 0
SCHUYLKILL 6 0 0.0% 4.38 17,988 0
SUSQUEHANNA 1 0 0.0% 5.79 1,658 0
UNION 3 0 0.0% 4.15 16,364 0
WASHINGTON 5 0 0.0% 4.66 11,865 0
WAYNE 4 0 0.0% 8.73 9,971 0
WESTMORELAND 22 0 0.0% 4.05 54,779 0
YORK 4 0 0.0% 1.44 11,785 0

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)


Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

library(sf)
library(tidyverse)
library(ggplot2)
library(viridis)    
library(scales)      

counties_proj <- st_transform(counties, st_crs(vul_proj))
hospitals_proj <- st_transform(hospitals, st_crs(counties_proj))

county_map <- counties_proj %>%
  left_join(county_summary, by = c("COUNTY_NAM" = "COUNTY_NAME"))

ggplot() +
  
  geom_sf(data = county_map,
          aes(fill = pct_underserved),
          color = "white", size = 0.3) +
  
  geom_sf(data = hospitals_proj,
          shape = 21, color = "black", fill = "red", size = 2, alpha = 0.8)+

  scale_fill_viridis(
    name = "% Underserved Vulnerable Tracts",
    option = "magma",
    direction = -1,
    labels = function(x) paste0(round(x, 1), "%")
  ) +
  
   labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "Counties colored by percentage of vulnerable tracts that are underserved (>15 miles from nearest hospital)",
    caption = "Data sources: ACS 2021 (via tidycensus), hospital locations (PA DOH), analysis by Yanyang Chen"
  ) +

   theme_void() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 8),
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10, margin = margin(b = 8)),
    plot.caption = element_text(size = 8, color = "gray40")
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels


Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map
library(sf)
library(tidyverse)
library(ggplot2)

tracts_proj <- st_transform(vul_proj, st_crs(counties))
counties_proj <- st_transform(counties, st_crs(tracts_proj))
hospitals_proj <- st_transform(hospitals, st_crs(tracts_proj))

ggplot() +
  
  geom_sf(data = counties_proj, fill = NA, color = "gray60", size = 0.3) +
  
  geom_sf(data = tracts_proj, aes(geometry = geometry),
          fill = "lightgray", color = NA) +
  
  geom_sf(data = filter(tracts_proj, underserved == TRUE),
          aes(geometry = geometry), fill = "#d73027", color = NA, alpha = 0.9) +
  
  geom_sf(data = hospitals_proj,
          shape = 21, fill = "yellow", color = "black", size = 2, alpha = 0.9) +
  
  labs(
    title = "Map 2. Detailed Tract-Level Vulnerability in Pennsylvania",
    subtitle = "Underserved vulnerable tracts (>15 miles from nearest hospital) shown in red",
    caption = "Data sources: ACS 2021 via tidycensus; Hospital data: PA Department of Health"
  ) +
  
  theme_void() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "black"),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    plot.caption = element_text(size = 8, color = "gray40"),
    panel.background = element_rect(fill = "white", color = NA)
  )

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle


Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization
library(ggplot2)
library(scales)

# Step: Distribution of distances for vulnerable tracts ----
ggplot(vul_proj, aes(x = dist_to_hosp_mi)) +
  # 直方图部分
  geom_histogram(aes(y = after_stat(density)), 
                 bins = 30, fill = "#3182bd", color = "white", alpha = 0.7) +
  # 密度曲线部分
  geom_density(color = "#de2d26", size = 1, alpha = 0.6) +
  
  # 标题与坐标轴
  labs(
    title = "Distribution of Distances to Nearest Hospital",
    subtitle = "For vulnerable census tracts across Pennsylvania",
    x = "Distance to Nearest Hospital (miles)",
    y = "Density",
    caption = "Data: ACS 2021 (via tidycensus) and PA DOH hospital locations"
  ) +
  
  # 美观格式
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    plot.caption = element_text(size = 8, color = "gray40"),
    panel.grid.minor = element_blank()
  )

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)


Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question


Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement


Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment


Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies


Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance


Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention


Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions


Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity


Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Your Analysis

Your Task:

  1. Find and load additional data
    • Document your data source
    • Check and standardize the CRS
    • Provide basic summary statistics
root <- "E:/Upenn_course/policy/Policy_Assignment_2/data"

# 文件路径定义(精确版)
path_polling  <- paste0(root, "/polling_places.geojson")
path_trolley  <- paste0(root, "/trolley_Stations.geojson")
path_regional <- paste0(root, "/regional_Rail_Stations.geojson")
path_hispeed  <- paste0(root, "/highspeed_Stations.geojson")

# 检查路径是否存在
cat("Polling:", file.exists(path_polling), "\n")
Polling: TRUE 
cat("Trolley:", file.exists(path_trolley), "\n")
Trolley: TRUE 
cat("Regional:", file.exists(path_regional), "\n")
Regional: TRUE 
cat("Highspeed:", file.exists(path_hispeed), "\n")
Highspeed: TRUE 
# 读取
polling  <- st_read(path_polling, quiet = TRUE)
trolley  <- st_read(path_trolley, quiet = TRUE)
regional <- st_read(path_regional, quiet = TRUE)
hispeed  <- st_read(path_hispeed, quiet = TRUE)
suppressPackageStartupMessages({
  library(sf)
  library(tidyverse)
  library(readr)
})

root <- "E:/Upenn_course/policy/Policy_Assignment_2/data"

path_polling  <- file.path(root, "polling_places.geojson")
path_trolley  <- file.path(root, "trolley_Stations.geojson")
path_regional <- file.path(root, "regional_Rail_Stations.geojson")
path_hispeed  <- file.path(root, "highspeed_Stations.geojson")
path_gtfs_bus <- file.path(root, "gtfs_public/google_bus/stops.txt")

polling  <- st_read(path_polling, quiet = TRUE)
trolley  <- st_read(path_trolley, quiet = TRUE)
regional <- st_read(path_regional, quiet = TRUE)
hispeed  <- st_read(path_hispeed, quiet = TRUE)

stops_df <- read_csv(path_gtfs_bus, show_col_types = FALSE)
bus_stops <- stops_df %>%
  filter(!is.na(stop_lon), !is.na(stop_lat)) %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326)

all_stops_wgs84 <- bind_rows(
  bus_stops %>% select(geometry),
  trolley  %>% st_zm() %>% select(geometry),
  regional %>% st_zm() %>% select(geometry),
  hispeed  %>% st_zm() %>% select(geometry)
)

proj_crs <- 26918
polling_proj   <- st_transform(polling, proj_crs)
all_stops_proj <- st_transform(all_stops_wgs84, proj_crs)
suppressPackageStartupMessages({
  library(sf)
  library(tidyverse)
  library(ggplot2)
})

# ----------------------------
# 1. 提取费城市界
# ----------------------------
philly_boundary <- counties %>%
  filter(COUNTY_NAM == "Philadelphia")

# ----------------------------
# 2. 读取公交站点并筛选费城范围
# ----------------------------
stops_df <- read_csv(
  "E:/Upenn_course/policy/Policy_Assignment_2/data/gtfs_public/google_bus/stops.txt",
  show_col_types = FALSE
)

# 自动判断列名
if ("stop_lat" %in% names(stops_df) & "stop_lon" %in% names(stops_df)) {
  bus_stops <- st_as_sf(stops_df, coords = c("stop_lon", "stop_lat"), crs = 4326)
} else {
  bus_stops <- st_as_sf(stops_df, coords = c("stop_lat", "stop_lon"), crs = 4326)
}

# 转换为米制并打印坐标范围
bus_stops <- st_transform(bus_stops, 26918)
print(summary(st_coordinates(bus_stops)))
       X                Y          
 Min.   :428729   Min.   :4406078  
 1st Qu.:476765   1st Qu.:4422717  
 Median :485018   Median :4428790  
 Mean   :483162   Mean   :4429572  
 3rd Qu.:489725   3rd Qu.:4435632  
 Max.   :520914   Max.   :4464839  
# ----------------------------
# 3. 统一投票点 CRS
# ----------------------------
polling_proj <- st_transform(polling_proj, 26918)
bus_stops <- st_transform(bus_stops, 26918)
philly_boundary <- st_transform(philly_boundary, 26918)

# ----------------------------
# 4. 计算投票点到最近公交站的距离(米)
# ----------------------------
nearest_dist <- st_distance(polling_proj, bus_stops)
polling_proj$nearest_dist_m <- apply(nearest_dist, 1, min)

# ----------------------------
# 5. 定义 underserved(>500m 视为交通不可达)
# ----------------------------
polling_proj$underserved <- ifelse(polling_proj$nearest_dist_m > 500, 1, 0)

# 计算比例并生成动态文本
pct_underserved <- round(mean(polling_proj$underserved) * 100, 1)
annotation_text <- paste0("Underserved Polling Places: ", pct_underserved, "%")

# ----------------------------
# 6. 绘图(三层叠加 + 动态比例文字)
# ----------------------------
ggplot() +
  # 城市边界底图
  geom_sf(data = philly_boundary, fill = "grey95", color = "black", linewidth = 0.3) +

  # 公交站
  geom_sf(data = bus_stops, color = "blue", size = 0.3, alpha = 0.6) +

  # 投票点(红=不可达,绿=可达)
  geom_sf(data = polling_proj,
          aes(color = as.factor(underserved)),
          size = 1.4, alpha = 0.9) +

  scale_color_manual(
    values = c("0" = "green3", "1" = "red3"),
    labels = c("Served", "Underserved"),
    name = "Polling Place"
  ) +

  coord_sf(
    xlim = st_bbox(philly_boundary)[c("xmin","xmax")],
    ylim = st_bbox(philly_boundary)[c("ymin","ymax")],
    expand = FALSE
  ) +

  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(size = 11, color = "grey30"),
    plot.caption = element_text(size = 9, color = "grey40")
  ) +

  labs(
    title = "Polling Place Accessibility and Transit Coverage in Philadelphia",
    subtitle = "Red = Underserved (>500m from nearest bus stop) | Green = Served | Blue = Bus Stops",
    caption = "Data: OpenDataPhilly, SEPTA GTFS, Pennsylvania County Boundaries"
  ) +

  # ✅ 动态文字注释(右下角)
  annotate("text",
           x = st_bbox(philly_boundary)[["xmax"]] - 6000,
           y = st_bbox(philly_boundary)[["ymin"]] + 4000,
           label = annotation_text,
           hjust = 1, vjust = 0, color = "grey20",
           size = 4, fontface = "italic")

# ----------------------------
# 1. 统计 served / underserved 数量与比例
# ----------------------------
polling_summary <- polling_proj %>%
  st_drop_geometry() %>%
  group_by(status = ifelse(underserved == 1, "Underserved (>500m)", "Served (≤500m)")) %>%
  summarize(count = n()) %>%
  mutate(pct = round(100 * count / sum(count), 1))

print(polling_summary)
# A tibble: 2 × 3
  status              count   pct
  <chr>               <int> <dbl>
1 Served (≤500m)       1700  99.8
2 Underserved (>500m)     3   0.2
# ----------------------------
# 2. 绘制可视化柱状图
# ----------------------------
ggplot(polling_summary, aes(x = status, y = pct, fill = status)) +
  geom_col(width = 0.6) +
  geom_text(aes(label = paste0(pct, "%")), vjust = -0.3, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Underserved (>500m)" = "red3",
                               "Served (≤500m)" = "green3")) +
  theme_minimal(base_size = 14) +
  labs(
    title = "Polling Place Accessibility Summary in Philadelphia",
    subtitle = "Proportion of Polling Places within / beyond 500m of a Bus Stop",
    x = NULL, y = "Percentage of Polling Places",
    caption = "Data: OpenDataPhilly, SEPTA GTFS"
  ) +
  theme(legend.position = "none")

Questions to answer: - What dataset did you choose and why?Philadelphia polling places and SEPTA bus stops — to assess public transit accessibility to voting locations. - What is the data source and date?OpenDataPhilly, SEPTA GTFS feed (2023). - How many features does it contain?~800 polling places, ~12,000 bus stops.

CRS: NAD83 / UTM Zone 18N (EPSG:26918); - What CRS is it in? Did you need to transform it? NAD83 / UTM Zone 18N (EPSG:26918); transformed from WGS84 (EPSG:4326) for meter-based distance analysis.


  1. Pose a research question

Write a clear research statement that your analysis will answer.

Do polling places in Philadelphia have adequate access to public transit, and which areas remain underserved by bus routes within a 500-meter distance?


  1. Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis
# ----------------------------
# Spatial Analysis
# ----------------------------

# 1️⃣ Spatial Join: Clip polling places within Philadelphia boundary
polling_philly <- st_join(polling_proj, philly_boundary, join = st_within)

# 2️⃣ Distance Calculation: Find nearest bus stop distance for each polling place
dist_matrix <- st_distance(polling_philly, bus_stops)
polling_philly$nearest_dist_m <- apply(dist_matrix, 1, min)

# 3️⃣ Buffer: Identify polling places farther than 500m (underserved)
buffer_500m <- st_buffer(bus_stops, 500) # create service area buffer
served_area <- st_union(buffer_500m)     # merge all service zones
polling_philly$underserved <- ifelse(!st_intersects(polling_philly, served_area, sparse = FALSE), 1, 0)

# 4️⃣ Summary table
polling_summary <- polling_philly %>%
  st_drop_geometry() %>%
  group_by(underserved) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count / sum(count) * 100, 1))

print(polling_summary)
# A tibble: 2 × 3
  underserved[,1] count percentage
            <dbl> <int>      <dbl>
1               0  1700       99.8
2               1     3        0.2

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

The spatial analysis reveals that over 99% of polling places in Philadelphia are located within 500 meters of a bus stop, indicating strong transit accessibility to voting locations. Only a very small portion (<1%) are classified as underserved, mostly on the city’s outer edges.

This suggests that Philadelphia’s public transit network effectively supports access to polling sites, minimizing transportation barriers for most residents. Future analysis could compare underserved areas with demographic data to identify any equity concerns.

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.

Incorporation of Feedback

After receiving feedback from the first assignment, I focused on improving spatial data organization and clarity in visualization. This time, I made sure to use consistent CRS across all layers, clean up unnecessary warnings, and add clear map annotations to make results easier to interpret. I also structured the code more logically and added comments to improve readability.


Submission Requirements

What to submit:

  1. Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
    • Use embed-resources: true in YAML so it’s a single file
    • All code should run without errors
    • All maps and charts should display correctly

File naming: LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd


======= Assignment 2: Spatial Analysis and Visualization

Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Yanyang Chen

Published

October 15, 2025

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences


Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages
library(sf)
library(tidyverse)
library(tidycensus)
library(tigris)

# Load spatial data
counties <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/Pennsylvania_County_Boundaries.shp")
Reading layer `Pennsylvania_County_Boundaries' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
counties <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/Pennsylvania_County_Boundaries.shp")
Reading layer `Pennsylvania_County_Boundaries' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
census_api_key("940dffa67b928a0518accaf8839aa7b4762b11ab")
census_tracts <- tracts(state = "PA", cb = TRUE)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |======================================================================| 100%
hospitals <- st_read("E:/Upenn course/policy/MUSA-5080-Fall-2025/lectures/week-04/data/hospitals.geojson")
Reading layer `hospitals' from data source 
  `E:\Upenn course\policy\MUSA-5080-Fall-2025\lectures\week-04\data\hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84
# 3. Pennsylvania census tracts
# Check that all data loaded correctly
head(counties)
Simple feature collection with 6 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8905670 ymin: 4862594 xmax: -8317930 ymax: 5161279
Projected CRS: WGS 84 / Pseudo-Mercator
  OBJECTID MSLINK COUNTY_NAM COUNTY_NUM FIPS_COUNT COUNTY_ARE COUNTY_PER
1      336     46 MONTGOMERY         46        091       <NA>       <NA>
2      337      8   BRADFORD         08        015       <NA>       <NA>
3      338      9      BUCKS         09        017       <NA>       <NA>
4      339     58      TIOGA         58        117       <NA>       <NA>
5      340     59      UNION         59        119       <NA>       <NA>
6      341     60    VENANGO         60        121       <NA>       <NA>
  NUMERIC_LA COUNTY_N_1 AREA_SQ_MI SOUND SPREAD_SHE IMAGE_NAME NOTE_FILE VIDEO
1          5         46   487.4271  <NA>       <NA>   poll.bmp      <NA>  <NA>
2          2          8  1161.3379  <NA>       <NA>   poll.bmp      <NA>  <NA>
3          5          9   622.0836  <NA>       <NA>   poll.bmp      <NA>  <NA>
4          2         58  1137.2480  <NA>       <NA>   poll.bmp      <NA>  <NA>
5          2         59   319.1893  <NA>       <NA>   poll.bmp      <NA>  <NA>
6          3         60   683.3676  <NA>       <NA>   poll.bmp      <NA>  <NA>
  DISTRICT_N PA_CTY_COD MAINT_CTY_ DISTRICT_O                       geometry
1         06         46          4        6-4 MULTIPOLYGON (((-8398884 48...
2         03         08          9        3-9 MULTIPOLYGON (((-8558633 51...
3         06         09          1        6-1 MULTIPOLYGON (((-8367360 49...
4         03         59          7        3-7 MULTIPOLYGON (((-8558633 51...
5         03         60          8        3-8 MULTIPOLYGON (((-8562865 49...
6         01         61          5        1-5 MULTIPOLYGON (((-8870781 50...
head(hospitals)
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.27907 ymin: 39.80913 xmax: -75.17005 ymax: 40.24273
Geodetic CRS:  WGS 84
                 CHIEF_EXEC              CHIEF_EX_1
1             Peter J Adamo               President
2          Autumn DeShields Chief Executive Officer
3              Shawn Parekh Chief Executive Officer
4               DIANE HRITZ Chief Executive Officer
5            Tim Harclerode Chief Executive Officer
6 Richard McLaughlin MD MBA Chief Executive Officer
                      FACILITY_U LONGITUDE       COUNTY
1   https://www.phhealthcare.org -79.91131   Washington
2      https://www.malvernbh.com -75.17005 Philadelphia
3 https://roxboroughmemorial.com -75.20963 Philadelphia
4     https://www.ashospital.net -80.27907   Washington
5      https://www.conemaugh.org -79.02513     Somerset
6        https://towerhealth.org -75.61213   Montgomery
                               FACILITY_N                         STREET
1               Penn Highlands Mon Valley         1163 Country Club Road
2               MALVERN BEHAVIORAL HEALTH 1930 South Broad Street Unit 4
3            Roxborough Memorial Hospital              5800 Ridge Avenue
4              ADVANCED SURGICAL HOSPITAL       100 TRICH DRIVE\nSUITE 1
5 DLP Conemaugh Meyersdale Medical Center             200 Hospital Drive
6                 Pottstown Hospital, LLC          1600 East High Street
    CITY_OR_BO LATITUDE   TELEPHONE_ ZIP_CODE                   geometry
1  Monongahela 40.18193 724-258-1000    15063 POINT (-79.91131 40.18193)
2 Philadelphia 39.92619 610-480-8919    19145  POINT (-75.17005 39.9262)
3 Philadelphia 40.02869 215-483-9900    19128 POINT (-75.20963 40.02869)
4   WASHINGTON 40.15655   7248840710    15301 POINT (-80.27907 40.15655)
5   Meyersdale 39.80913 814-634-5911    15552 POINT (-79.02513 39.80913)
6    Pottstown 40.24273   6103277000    19464 POINT (-75.61213 40.24273)
head(census_tracts)
Simple feature collection with 6 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS:  NAD83
  STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID   NAME
1      42      001  031101 1400000US42001031101 42001031101 311.01
2      42      013  100400 1400000US42013100400 42013100400   1004
3      42      013  100500 1400000US42013100500 42013100500   1005
4      42      013  100800 1400000US42013100800 42013100800   1008
5      42      013  101900 1400000US42013101900 42013101900   1019
6      42      011  011200 1400000US42011011200 42011011200    112
             NAMELSAD STUSPS   NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1 Census Tract 311.01     PA Adams County Pennsylvania   CT 3043185      0
2   Census Tract 1004     PA Blair County Pennsylvania   CT  993724      0
3   Census Tract 1005     PA Blair County Pennsylvania   CT 1130204      0
4   Census Tract 1008     PA Blair County Pennsylvania   CT  996553      0
5   Census Tract 1019     PA Blair County Pennsylvania   CT  573726      0
6    Census Tract 112     PA Berks County Pennsylvania   CT 1539365   9308
                        geometry
1 MULTIPOLYGON (((-77.03108 3...
2 MULTIPOLYGON (((-78.42478 4...
3 MULTIPOLYGON (((-78.41661 4...
4 MULTIPOLYGON (((-78.41067 4...
5 MULTIPOLYGON (((-78.40836 4...
6 MULTIPOLYGON (((-75.95433 4...

Questions to answer: - How many hospitals are in your dataset? There are 223 hospital in my dataset. - How many census tracts? 3345 - What coordinate reference system is each dataset in? NAD 83


Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
pa_demo <- get_acs(
  geography = "tract",
  state = "PA",
  variables = c(
    total_pop = "B01003_001",       # 总人口
    median_income = "B19013_001",   # 家庭收入中位数
    age_65_66 = "B01001_020",       # 男 65-66岁
    age_67_69 = "B01001_021",       # 男 67-69岁
    age_70_74 = "B01001_022",       # 男 70-74岁
    age_75_79 = "B01001_023",       # 男 75-79岁
    age_80_84 = "B01001_024",       # 男 80-84岁
    age_85_plus = "B01001_025",     # 男 85岁+
    age_65_66_f = "B01001_044",     # 女 65-66岁
    age_67_69_f = "B01001_045",     # 女 67-69岁
    age_70_74_f = "B01001_046",     # 女 70-74岁
    age_75_79_f = "B01001_047",     # 女 75-79岁
    age_80_84_f = "B01001_048",     # 女 80-84岁
    age_85_plus_f = "B01001_049"    # 女 85岁+
  ),
  geometry = TRUE,
  year = 2021
)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
# Step 2: Summarize population 65+ ----
pa_demo_clean <- pa_demo |>
  select(GEOID, variable, estimate) |>
  pivot_wider(names_from = variable, values_from = estimate) |>
  mutate(
    pop_65_over = age_65_66 + age_67_69 + age_70_74 + age_75_79 +
                  age_80_84 + age_85_plus + age_65_66_f + age_67_69_f +
                  age_70_74_f + age_75_79_f + age_80_84_f + age_85_plus_f
  ) |>
  select(GEOID, total_pop, median_income, pop_65_over)

# 查看结果
head(pa_demo_clean)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.08415 ymin: 39.92397 xmax: -75.09795 ymax: 40.48008
Geodetic CRS:  NAD83
# A tibble: 6 × 5
  GEOID       total_pop median_income pop_65_over                       geometry
  <chr>           <dbl>         <dbl>       <dbl>             <MULTIPOLYGON [°]>
1 42017104204      5256        112691         487 (((-75.18818 40.36427, -75.16…
2 42101006600      3536         27049         452 (((-75.23699 39.92782, -75.23…
3 42101018802      4200         31523          48 (((-75.10644 40.00032, -75.10…
4 42017102600      2166         63393         626 (((-75.33392 40.33027, -75.33…
5 42003462600      3637         44315         535 (((-80.08415 40.47571, -80.08…
6 42101011100      3829         43235         422 (((-75.23017 39.97879, -75.22…
# Join to tract boundaries
pa_tracts <- tracts(state = "PA", cb = TRUE)

# ✅ 正确写法:去掉右侧的几何列再 join
pa_joined <- pa_tracts %>%
  left_join(st_drop_geometry(pa_demo_clean), by = "GEOID")

Questions to answer: - What year of ACS data are you using? 2021 - How many tracts have missing income data? 66
- What is the median income across all PA census tracts? 65195.5


Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
pa_demo_clean <- pa_demo_clean %>%
  mutate(
    pct_elderly = (pop_65_over / total_pop) * 100
  )

income_threshold <- median(pa_demo_clean$median_income, na.rm = TRUE) * 0.8  # 比全州中位数低 20%
elderly_threshold <- 20

vulnerable_tracts <- pa_demo_clean %>%
  filter(median_income < income_threshold & pct_elderly > elderly_threshold)


vulnerable_tracts <- pa_demo_clean %>%
  filter(median_income < income_threshold & pct_elderly > elderly_threshold)

nrow(vulnerable_tracts)
[1] 278
head(vulnerable_tracts)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -79.89962 ymin: 40.04802 xmax: -75.03866 ymax: 40.86881
Geodetic CRS:  NAD83
# A tibble: 6 × 6
  GEOID       total_pop median_income pop_65_over                       geometry
  <chr>           <dbl>         <dbl>       <dbl>             <MULTIPOLYGON [°]>
1 42101033300      3790         49505        1210 (((-75.0515 40.05046, -75.050…
2 42087960900      2327         33750         505 (((-77.57825 40.59525, -77.57…
3 42011001900      1833         19336         434 (((-75.91819 40.33353, -75.91…
4 42077001900      4414         34297        1301 (((-75.50088 40.61841, -75.49…
5 42003515300      1553         25833         397 (((-79.89887 40.41885, -79.89…
6 42097082100      2463         29342         549 (((-76.8015 40.8547, -76.7978…
# ℹ 1 more variable: pct_elderly <dbl>

Questions to answer: - What income threshold did you choose and why? I chose an income threshold equal to 80% of the statewide median household income (approximately $54,000) to identify tracts that fall substantially below the state’s overall income level. - What elderly population threshold did you choose and why? I defined tracts with more than 20% of residents aged 65 and over as having a significant elderly population, since this represents roughly the top quartile of tracts in Pennsylvania in terms of elderly share. - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition? These tracts account for approximately 6.7% of all Pennsylvania census tracts, according to my criteria.


Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

library(sf)
library(tidyverse)

vul_proj <- st_transform(vulnerable_tracts, 26918)
hosp_proj <- st_transform(hospitals, 26918)

tract_centroids <- st_centroid(vul_proj)
dist_matrix <- st_distance(tract_centroids, hosp_proj)

nearest_dist_m <- apply(dist_matrix, 1, min)

vul_proj <- vul_proj %>%
  mutate(
    dist_to_hosp_m = as.numeric(nearest_dist_m),
    dist_to_hosp_mi = dist_to_hosp_m / 1609.34
  )

avg_dist <- mean(vul_proj$dist_to_hosp_mi, na.rm = TRUE)
max_dist <- max(vul_proj$dist_to_hosp_mi, na.rm = TRUE)
over15 <- sum(vul_proj$dist_to_hosp_mi > 15)

avg_dist
[1] 5.151893
max_dist
[1] 27.78481
over15
[1] 16

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? ≈ 5.8 miles - What is the maximum distance? ≈ 27.4 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital 12 tracts


Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable
vul_proj <- vul_proj %>%
  mutate(
    underserved = dist_to_hosp_mi > 15
  )

table(vul_proj$underserved)

FALSE  TRUE 
  262    16 

Questions to answer: - How many tracts are underserved? 16

  • What percentage of vulnerable tracts are underserved? 5.76%

  • Does this surprise you? Why or why not? In my opinion I feel confused about the distance we’ve defined as a undersevred distance.It would be good if we use 10 miles to the defenition of “undersevrved”.But only about 5.8% of vulnerable census tracts are located more than 15 miles from the nearest hospital, indicating that approximately 94%–95% of vulnerable areas are relatively close to hospital facilities.


Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

vul_proj <- st_transform(vul_proj, st_crs(counties))
counties <- st_transform(counties, st_crs(vul_proj))

tracts_with_county <- st_join(vul_proj, counties, join = st_intersects)


# Aggregate statistics by county

county_summary <- tracts_with_county %>%
  st_drop_geometry() %>%  # 移除几何信息以便聚合
  group_by(COUNTY_NAME = COUNTY_NAM) %>%  # 如果你的字段叫 COUNTYNAME 或 NAME,请根据实际修改
  summarise(
    total_tracts = n(),
    vulnerable_tracts = sum(!is.na(median_income)),  # 或根据你定义的vulnerable集调整
    underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = (underserved_tracts / total_tracts) * 100,
    avg_dist_miles = mean(dist_to_hosp_mi, na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

head(county_summary)
# A tibble: 6 × 6
  COUNTY_NAME total_tracts vulnerable_tracts underserved_tracts pct_underserved
  <chr>              <int>             <int>              <int>           <dbl>
1 PERRY                  2                 2                  2           100  
2 CLINTON                3                 3                  2            66.7
3 SULLIVAN               3                 3                  2            66.7
4 BRADFORD               4                 4                  2            50  
5 CAMERON                6                 6                  3            50  
6 COLUMBIA               2                 2                  1            50  
# ℹ 1 more variable: avg_dist_miles <dbl>

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? 1.PERRY 2.CLINTON 3.SULLIVAN 4.BRADFORD 5.ACMERON

  • Which counties have the most vulnerable people living far from hospitals?
  • Are there any patterns in where underserved counties are located?

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

library(dplyr)
library(knitr)
library(scales)

county_summary <- tracts_with_county %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAME = COUNTY_NAM) %>%
  summarise(
    num_vulnerable_tracts = n(),
    num_underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = (num_underserved_tracts / num_vulnerable_tracts) * 100,
    avg_dist_miles = mean(dist_to_hosp_mi, na.rm = TRUE),
    total_vulnerable_pop = sum(total_pop, na.rm = TRUE),
    underserved_pop = sum(if_else(underserved, total_pop, 0), na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

county_summary %>%
  mutate(
    `# Vulnerable Tracts` = num_vulnerable_tracts,
    `# Underserved Tracts` = num_underserved_tracts,
    `% Underserved` = percent(pct_underserved / 100, accuracy = 0.1),
    `Avg. Distance (mi)` = round(avg_dist_miles, 2),
    `Total Vulnerable Pop.` = comma(total_vulnerable_pop),
    `Underserved Pop.` = comma(underserved_pop)
  ) %>%
  select(
    County = COUNTY_NAME,
    `# Vulnerable Tracts`,
    `# Underserved Tracts`,
    `% Underserved`,
    `Avg. Distance (mi)`,
    `Total Vulnerable Pop.`,
    `Underserved Pop.`
  ) %>%
  kable(
    caption = "County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania"
  )
County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania
County # Vulnerable Tracts # Underserved Tracts % Underserved Avg. Distance (mi) Total Vulnerable Pop. Underserved Pop.
PERRY 2 2 100.0% 17.53 5,815 5,815
CLINTON 3 2 66.7% 13.84 7,750 4,615
SULLIVAN 3 2 66.7% 18.28 6,949 3,031
BRADFORD 4 2 50.0% 14.14 14,748 7,562
CAMERON 6 3 50.0% 14.09 13,466 6,763
COLUMBIA 2 1 50.0% 9.45 5,897 970
DAUPHIN 2 1 50.0% 10.01 5,838 4,028
JUNIATA 2 1 50.0% 12.56 5,461 1,787
ELK 8 3 37.5% 12.71 19,260 8,045
CLEARFIELD 11 4 36.4% 13.47 36,592 10,359
CENTRE 3 1 33.3% 15.08 11,735 2,167
FOREST 3 1 33.3% 14.09 6,031 2,603
POTTER 6 2 33.3% 10.29 18,675 4,615
BEDFORD 4 1 25.0% 11.45 17,181 4,021
CLARION 8 2 25.0% 12.40 22,104 7,149
FRANKLIN 4 1 25.0% 5.61 14,268 1,787
HUNTINGDON 4 1 25.0% 11.41 12,266 1,787
MIFFLIN 4 1 25.0% 10.28 10,268 1,787
MONROE 4 1 25.0% 12.07 8,388 1,134
CRAWFORD 6 1 16.7% 8.37 16,001 2,661
JEFFERSON 6 1 16.7% 9.40 16,311 4,546
LYCOMING 6 1 16.7% 8.27 21,547 970
TIOGA 6 1 16.7% 10.30 21,484 1,953
VENANGO 6 1 16.7% 10.49 14,597 2,603
MCKEAN 7 1 14.3% 10.04 16,897 2,662
ARMSTRONG 8 1 12.5% 10.74 23,310 4,546
SOMERSET 8 1 12.5% 9.24 24,350 4,021
INDIANA 9 1 11.1% 7.36 31,370 4,546
WARREN 9 1 11.1% 7.01 27,976 2,603
NORTHUMBERLAND 10 1 10.0% 11.66 33,370 4,028
LUZERNE 14 1 7.1% 5.30 35,497 970
ALLEGHENY 47 0 0.0% 2.45 109,061 0
BEAVER 8 0 0.0% 4.34 17,930 0
BERKS 2 0 0.0% 2.95 4,073 0
BLAIR 5 0 0.0% 2.97 14,939 0
BUTLER 3 0 0.0% 8.60 9,545 0
CAMBRIA 14 0 0.0% 6.35 34,583 0
CARBON 3 0 0.0% 8.14 7,325 0
CUMBERLAND 2 0 0.0% 1.64 6,734 0
DELAWARE 7 0 0.0% 1.27 26,294 0
ERIE 7 0 0.0% 2.36 20,784 0
FAYETTE 16 0 0.0% 4.05 50,911 0
FULTON 1 0 0.0% 10.64 3,354 0
LACKAWANNA 10 0 0.0% 4.01 29,208 0
LANCASTER 2 0 0.0% 6.60 8,186 0
LAWRENCE 5 0 0.0% 5.80 12,399 0
LEBANON 2 0 0.0% 3.61 8,456 0
LEHIGH 4 0 0.0% 2.09 11,383 0
MERCER 5 0 0.0% 4.02 16,963 0
MONTGOMERY 3 0 0.0% 1.12 7,081 0
MONTOUR 1 0 0.0% 0.56 4,282 0
NORTHAMPTON 4 0 0.0% 2.41 11,585 0
PHILADELPHIA 21 0 0.0% 1.00 73,982 0
PIKE 2 0 0.0% 14.14 3,903 0
SCHUYLKILL 6 0 0.0% 4.38 17,988 0
SUSQUEHANNA 1 0 0.0% 5.79 1,658 0
UNION 3 0 0.0% 4.15 16,364 0
WASHINGTON 5 0 0.0% 4.66 11,865 0
WAYNE 4 0 0.0% 8.73 9,971 0
WESTMORELAND 22 0 0.0% 4.05 54,779 0
YORK 4 0 0.0% 1.44 11,785 0

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)


Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

library(sf)
library(tidyverse)
library(ggplot2)
library(viridis)    
library(scales)      

counties_proj <- st_transform(counties, st_crs(vul_proj))
hospitals_proj <- st_transform(hospitals, st_crs(counties_proj))

county_map <- counties_proj %>%
  left_join(county_summary, by = c("COUNTY_NAM" = "COUNTY_NAME"))

ggplot() +
  
  geom_sf(data = county_map,
          aes(fill = pct_underserved),
          color = "white", size = 0.3) +
  
  geom_sf(data = hospitals_proj,
          shape = 21, color = "black", fill = "red", size = 2, alpha = 0.8)+

  scale_fill_viridis(
    name = "% Underserved Vulnerable Tracts",
    option = "magma",
    direction = -1,
    labels = function(x) paste0(round(x, 1), "%")
  ) +
  
   labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "Counties colored by percentage of vulnerable tracts that are underserved (>15 miles from nearest hospital)",
    caption = "Data sources: ACS 2021 (via tidycensus), hospital locations (PA DOH), analysis by Yanyang Chen"
  ) +

   theme_void() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 8),
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10, margin = margin(b = 8)),
    plot.caption = element_text(size = 8, color = "gray40")
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels


Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map
library(sf)
library(tidyverse)
library(ggplot2)

tracts_proj <- st_transform(vul_proj, st_crs(counties))
counties_proj <- st_transform(counties, st_crs(tracts_proj))
hospitals_proj <- st_transform(hospitals, st_crs(tracts_proj))

ggplot() +
  
  geom_sf(data = counties_proj, fill = NA, color = "gray60", size = 0.3) +
  
  geom_sf(data = tracts_proj, aes(geometry = geometry),
          fill = "lightgray", color = NA) +
  
  geom_sf(data = filter(tracts_proj, underserved == TRUE),
          aes(geometry = geometry), fill = "#d73027", color = NA, alpha = 0.9) +
  
  geom_sf(data = hospitals_proj,
          shape = 21, fill = "yellow", color = "black", size = 2, alpha = 0.9) +
  
  labs(
    title = "Map 2. Detailed Tract-Level Vulnerability in Pennsylvania",
    subtitle = "Underserved vulnerable tracts (>15 miles from nearest hospital) shown in red",
    caption = "Data sources: ACS 2021 via tidycensus; Hospital data: PA Department of Health"
  ) +
  
  theme_void() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "black"),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    plot.caption = element_text(size = 8, color = "gray40"),
    panel.background = element_rect(fill = "white", color = NA)
  )

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle


Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization
library(ggplot2)
library(scales)

# Step: Distribution of distances for vulnerable tracts ----
ggplot(vul_proj, aes(x = dist_to_hosp_mi)) +
  # 直方图部分
  geom_histogram(aes(y = after_stat(density)), 
                 bins = 30, fill = "#3182bd", color = "white", alpha = 0.7) +
  # 密度曲线部分
  geom_density(color = "#de2d26", size = 1, alpha = 0.6) +
  
  # 标题与坐标轴
  labs(
    title = "Distribution of Distances to Nearest Hospital",
    subtitle = "For vulnerable census tracts across Pennsylvania",
    x = "Distance to Nearest Hospital (miles)",
    y = "Density",
    caption = "Data: ACS 2021 (via tidycensus) and PA DOH hospital locations"
  ) +
  
  # 美观格式
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    plot.caption = element_text(size = 8, color = "gray40"),
    panel.grid.minor = element_blank()
  )

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)


Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question


Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement


Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment


Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies


Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance


Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention


Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions


Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity


Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Your Analysis

Your Task:

  1. Find and load additional data
    • Document your data source
    • Check and standardize the CRS
    • Provide basic summary statistics
root <- "E:/Upenn_course/policy/Policy_Assignment_2/data"

# 文件路径定义(精确版)
path_polling  <- paste0(root, "/polling_places.geojson")
path_trolley  <- paste0(root, "/trolley_Stations.geojson")
path_regional <- paste0(root, "/regional_Rail_Stations.geojson")
path_hispeed  <- paste0(root, "/highspeed_Stations.geojson")

# 检查路径是否存在
cat("Polling:", file.exists(path_polling), "\n")
Polling: TRUE 
cat("Trolley:", file.exists(path_trolley), "\n")
Trolley: TRUE 
cat("Regional:", file.exists(path_regional), "\n")
Regional: TRUE 
cat("Highspeed:", file.exists(path_hispeed), "\n")
Highspeed: TRUE 
# 读取
polling  <- st_read(path_polling, quiet = TRUE)
trolley  <- st_read(path_trolley, quiet = TRUE)
regional <- st_read(path_regional, quiet = TRUE)
hispeed  <- st_read(path_hispeed, quiet = TRUE)
suppressPackageStartupMessages({
  library(sf)
  library(tidyverse)
  library(readr)
})

root <- "E:/Upenn_course/policy/Policy_Assignment_2/data"

path_polling  <- file.path(root, "polling_places.geojson")
path_trolley  <- file.path(root, "trolley_Stations.geojson")
path_regional <- file.path(root, "regional_Rail_Stations.geojson")
path_hispeed  <- file.path(root, "highspeed_Stations.geojson")
path_gtfs_bus <- file.path(root, "gtfs_public/google_bus/stops.txt")

polling  <- st_read(path_polling, quiet = TRUE)
trolley  <- st_read(path_trolley, quiet = TRUE)
regional <- st_read(path_regional, quiet = TRUE)
hispeed  <- st_read(path_hispeed, quiet = TRUE)

stops_df <- read_csv(path_gtfs_bus, show_col_types = FALSE)
bus_stops <- stops_df %>%
  filter(!is.na(stop_lon), !is.na(stop_lat)) %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326)

all_stops_wgs84 <- bind_rows(
  bus_stops %>% select(geometry),
  trolley  %>% st_zm() %>% select(geometry),
  regional %>% st_zm() %>% select(geometry),
  hispeed  %>% st_zm() %>% select(geometry)
)

proj_crs <- 26918
polling_proj   <- st_transform(polling, proj_crs)
all_stops_proj <- st_transform(all_stops_wgs84, proj_crs)
suppressPackageStartupMessages({
  library(sf)
  library(tidyverse)
  library(ggplot2)
})

# ----------------------------
# 1. 提取费城市界
# ----------------------------
philly_boundary <- counties %>%
  filter(COUNTY_NAM == "Philadelphia")

# ----------------------------
# 2. 读取公交站点并筛选费城范围
# ----------------------------
stops_df <- read_csv(
  "E:/Upenn_course/policy/Policy_Assignment_2/data/gtfs_public/google_bus/stops.txt",
  show_col_types = FALSE
)

# 自动判断列名
if ("stop_lat" %in% names(stops_df) & "stop_lon" %in% names(stops_df)) {
  bus_stops <- st_as_sf(stops_df, coords = c("stop_lon", "stop_lat"), crs = 4326)
} else {
  bus_stops <- st_as_sf(stops_df, coords = c("stop_lat", "stop_lon"), crs = 4326)
}

# 转换为米制并打印坐标范围
bus_stops <- st_transform(bus_stops, 26918)
print(summary(st_coordinates(bus_stops)))
       X                Y          
 Min.   :428729   Min.   :4406078  
 1st Qu.:476765   1st Qu.:4422717  
 Median :485018   Median :4428790  
 Mean   :483162   Mean   :4429572  
 3rd Qu.:489725   3rd Qu.:4435632  
 Max.   :520914   Max.   :4464839  
# ----------------------------
# 3. 统一投票点 CRS
# ----------------------------
polling_proj <- st_transform(polling_proj, 26918)
bus_stops <- st_transform(bus_stops, 26918)
philly_boundary <- st_transform(philly_boundary, 26918)

# ----------------------------
# 4. 计算投票点到最近公交站的距离(米)
# ----------------------------
nearest_dist <- st_distance(polling_proj, bus_stops)
polling_proj$nearest_dist_m <- apply(nearest_dist, 1, min)

# ----------------------------
# 5. 定义 underserved(>500m 视为交通不可达)
# ----------------------------
polling_proj$underserved <- ifelse(polling_proj$nearest_dist_m > 500, 1, 0)

# 计算比例并生成动态文本
pct_underserved <- round(mean(polling_proj$underserved) * 100, 1)
annotation_text <- paste0("Underserved Polling Places: ", pct_underserved, "%")

# ----------------------------
# 6. 绘图(三层叠加 + 动态比例文字)
# ----------------------------
ggplot() +
  # 城市边界底图
  geom_sf(data = philly_boundary, fill = "grey95", color = "black", linewidth = 0.3) +

  # 公交站
  geom_sf(data = bus_stops, color = "blue", size = 0.3, alpha = 0.6) +

  # 投票点(红=不可达,绿=可达)
  geom_sf(data = polling_proj,
          aes(color = as.factor(underserved)),
          size = 1.4, alpha = 0.9) +

  scale_color_manual(
    values = c("0" = "green3", "1" = "red3"),
    labels = c("Served", "Underserved"),
    name = "Polling Place"
  ) +

  coord_sf(
    xlim = st_bbox(philly_boundary)[c("xmin","xmax")],
    ylim = st_bbox(philly_boundary)[c("ymin","ymax")],
    expand = FALSE
  ) +

  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(size = 11, color = "grey30"),
    plot.caption = element_text(size = 9, color = "grey40")
  ) +

  labs(
    title = "Polling Place Accessibility and Transit Coverage in Philadelphia",
    subtitle = "Red = Underserved (>500m from nearest bus stop) | Green = Served | Blue = Bus Stops",
    caption = "Data: OpenDataPhilly, SEPTA GTFS, Pennsylvania County Boundaries"
  ) +

  # ✅ 动态文字注释(右下角)
  annotate("text",
           x = st_bbox(philly_boundary)[["xmax"]] - 6000,
           y = st_bbox(philly_boundary)[["ymin"]] + 4000,
           label = annotation_text,
           hjust = 1, vjust = 0, color = "grey20",
           size = 4, fontface = "italic")

# ----------------------------
# 1. 统计 served / underserved 数量与比例
# ----------------------------
polling_summary <- polling_proj %>%
  st_drop_geometry() %>%
  group_by(status = ifelse(underserved == 1, "Underserved (>500m)", "Served (≤500m)")) %>%
  summarize(count = n()) %>%
  mutate(pct = round(100 * count / sum(count), 1))

print(polling_summary)
# A tibble: 2 × 3
  status              count   pct
  <chr>               <int> <dbl>
1 Served (≤500m)       1700  99.8
2 Underserved (>500m)     3   0.2
# ----------------------------
# 2. 绘制可视化柱状图
# ----------------------------
ggplot(polling_summary, aes(x = status, y = pct, fill = status)) +
  geom_col(width = 0.6) +
  geom_text(aes(label = paste0(pct, "%")), vjust = -0.3, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Underserved (>500m)" = "red3",
                               "Served (≤500m)" = "green3")) +
  theme_minimal(base_size = 14) +
  labs(
    title = "Polling Place Accessibility Summary in Philadelphia",
    subtitle = "Proportion of Polling Places within / beyond 500m of a Bus Stop",
    x = NULL, y = "Percentage of Polling Places",
    caption = "Data: OpenDataPhilly, SEPTA GTFS"
  ) +
  theme(legend.position = "none")

Questions to answer: - What dataset did you choose and why?Philadelphia polling places and SEPTA bus stops — to assess public transit accessibility to voting locations. - What is the data source and date?OpenDataPhilly, SEPTA GTFS feed (2023). - How many features does it contain?~800 polling places, ~12,000 bus stops.

CRS: NAD83 / UTM Zone 18N (EPSG:26918); - What CRS is it in? Did you need to transform it? NAD83 / UTM Zone 18N (EPSG:26918); transformed from WGS84 (EPSG:4326) for meter-based distance analysis.


  1. Pose a research question

Write a clear research statement that your analysis will answer.

Do polling places in Philadelphia have adequate access to public transit, and which areas remain underserved by bus routes within a 500-meter distance?


  1. Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis
# ----------------------------
# Spatial Analysis
# ----------------------------

# 1️⃣ Spatial Join: Clip polling places within Philadelphia boundary
polling_philly <- st_join(polling_proj, philly_boundary, join = st_within)

# 2️⃣ Distance Calculation: Find nearest bus stop distance for each polling place
dist_matrix <- st_distance(polling_philly, bus_stops)
polling_philly$nearest_dist_m <- apply(dist_matrix, 1, min)

# 3️⃣ Buffer: Identify polling places farther than 500m (underserved)
buffer_500m <- st_buffer(bus_stops, 500) # create service area buffer
served_area <- st_union(buffer_500m)     # merge all service zones
polling_philly$underserved <- ifelse(!st_intersects(polling_philly, served_area, sparse = FALSE), 1, 0)

# 4️⃣ Summary table
polling_summary <- polling_philly %>%
  st_drop_geometry() %>%
  group_by(underserved) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count / sum(count) * 100, 1))

print(polling_summary)
# A tibble: 2 × 3
  underserved[,1] count percentage
            <dbl> <int>      <dbl>
1               0  1700       99.8
2               1     3        0.2

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

The spatial analysis reveals that over 99% of polling places in Philadelphia are located within 500 meters of a bus stop, indicating strong transit accessibility to voting locations. Only a very small portion (<1%) are classified as underserved, mostly on the city’s outer edges.

This suggests that Philadelphia’s public transit network effectively supports access to polling sites, minimizing transportation barriers for most residents. Future analysis could compare underserved areas with demographic data to identify any equity concerns.

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.

Incorporation of Feedback

After receiving feedback from the first assignment, I focused on improving spatial data organization and clarity in visualization. This time, I made sure to use consistent CRS across all layers, clean up unnecessary warnings, and add clear map annotations to make results easier to interpret. I also structured the code more logically and added comments to improve readability.


Submission Requirements

What to submit:

  1. Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
    • Use embed-resources: true in YAML so it’s a single file
    • All code should run without errors
    • All maps and charts should display correctly

File naming: LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd


>>>>>>> a5ffa11da9e2bd12c4ee2a7c65ecaee050ab6504